
Institute o f 



NATIONAL CENTER for 

EDUCATION STATISTICS 

Education Sciences 



U.S. Department of Education 

Institute of Education Sciences 
NCES 2006-029 



Comparing 
Mathematics Content 
in the National 
Assessment of 
Educational Progress 
(NAEP), Trends in 
International 
Mathematics and 
Science Study (TIMSS), 
and Program for 
International Student 
Assessment (PISA) 2003 
Assessments 



Technical Report 



May 2006 



Teresa Smith Neidorf 

NAEP Education Statistics Services Institute 
American Institutes for Research 



Marilyn Binkley 

National Center for Education Statistics 

Kim Gattis 

NAEP Education Statistics Services Institute 
American Institutes for Research 



David Nohara 

Independent Consultant 



U.S. Department of Education 

Margaret Spellings 
Secretary 

Institute of Education Sciences 

Grover J. Whitehurst 
Director 

National Center for Education Statistics 

Mark Schneider 
Commissioner 

The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and 
reporting data related to education in the United States and other nations. It fulfills a congressional mandate to collect, 
collate, analyze, and report full and complete statistics on the condition of education in the United States; conduct and 
publish reports and specialized analyses of the meaning and significance of such statistics; assist state and local 
education agencies in improving their statistical systems; and review and report on education activities in foreign 
countries. 

NCES activities are designed to address high priority education data needs; provide consistent, reliable, complete, and 
accurate indicators of education status and trends; and report timely, useful, and high quality data to the U.S. 
Department of Education, the Congress, the states, other education policymakers, practitioners, data users, and the 
general public. 

We strive to make our products available in a variety of formats and in language that is appropriate to a variety of 
audiences. You, as our customer, are the best judge of our success in communicating information effectively. If you 
have any comments or suggestions about this or any other NCES product or report, we would like to hear from you. 
Please direct your comments to: 

National Center for Education Statistics 
Institute of Education Sciences 
U.S. Department of Education 
1990 K Street NW 
Washington, DC 20006 



May 2006 

The NCES World Wide Web Home Page is http://nces.ed.gov . 

The NCES World Wide Web Electronic Catalog is http://nces.ed.gov/pubsearch . 

Suggested Citation 

Neidorf, T.S., Binkley, M., Gattis, K., and Nohara, D. (2006). Comparing Mathematics Content in the National 
Assessment of Educational Progress (NAEP), Trends in International Mathematics and Science Study (TIMSS), and 
Program for International Student Assessment (PISA) 2003 Assessments (NCES 2006-029). U.S. Department of 
Education. Washington, DC: National Center for Education Statistics. Retrieved [date] from 
http://nces.ed.gov/pubsearch . 

For ordering information on this report, write to 

U.S. Department of Education 
ED Pubs 
P.O. Box 1398 
Jessup, MD 20794-1398 

or call toll free l-877-4ED-Pubs or order online at http://www.edpubs.org. 

Content Contacts: 

Marilyn Binkley 
(202) 502-7484 
Marilvn.Binklev@ed. gov 



Elois Scott 
(202) 502-7489 
Elois.Scott@ed.gov 



Executive Summary 



The National Center for Education Statistics (NCES) collects information on student 
performance in key subject areas through the National Assessment of Educational Progress (NAEP), 
as well as through participation in international studies of student achievement. Information from 
these studies is used to inform policymakers, educators, researchers, and the public about the 
knowledge and skills of U.S. students and how these compare with students in other countries. 

This technical report describes a study that was undertaken to compare the content of three 
mathematics assessments conducted in 2003: the NAEP fourth- and eighth-grade assessments; the 
Trends in International Mathematics and Science Study (TIMSS), which also assessed mathematics 
at the fourth- and eighth-grade levels; and the Program for International Student Assessment (PISA), 
which assessed the mathematical literacy of 15-year-old students. Its aim is to provide information 
useful for interpreting and comparing the results from the three assessments, based on an in-depth 
look at the content of the respective frameworks 1 and assessment items. 

The report draws upon information provided by the developers of the assessments, as well as 
data obtained from an expert panel convened to compare the frameworks and items from the three 
assessments on various dimensions. 2 The frameworks were compared with respect to 

• how each assessment organizes and defines the mathematics content and process skills to be 
assessed at each grade (or age) level; 

• the main content areas included and the set of topics covered in each; and 

• other aspects, such as item format and calculator policy. 

Item comparisons were based on 



• cross-classification of NAEP and TIMSS items to each other’s assessment framework in 
terms of the mathematics content covered and grade-level expectations; 

• classification of PISA items to the NAEP framework on these same dimensions; 

• classification of all items with respect to their level of mathematical complexity ; 3 and 

• comparisons based on other framework dimensions related to cognitive processes, item 
formats, and item contexts. 



1 Assessment frameworks define what will be assessed, including the content to be covered, the types of test 
questions, and recommendations for how the test is administered. 

2 The panel members — experts in mathematics, mathematics education, and mathematics assessment, with 
familiarity and experience across the three assessments — are listed in appendix C. 

3 Mathematical complexity’ reflects the demands on thinking that an item makes, assuming that a student is familiar 
with the mathematics content of the task. The classifications in this report are based on the definitions in the NAEP 
2005 framework for three levels of mathematical complexity — low, moderate, and high — that form an ordered 
description of the demands an item may make on a student (described in appendix B). 



iii 




While comparisons between NAEP, TIMSS, and PISA were focused on the common 
classification systems based on the NAEP framework, the study also included a limited comparison 
between PISA items and NAEP eighth- and twelfth-grade problem solving items in light of the 
dimensions in the PISA framework. Example items are referenced throughout the report to illustrate 
some key similarities and differences. 4 

The results of this study indicate that although the NAEP, TIMSS, and PISA 2003 
mathematics frameworks address many similar topics and require students to use a range of cognitive 
skills and processes, it cannot be assumed that they measure the same content in the same way. A 
hypothetical student who takes all three assessments might indeed perform equally well on them, but 
depending on the curriculum they have been exposed to and their skill and experience in various 
types of mathematical thinking, other students might exhibit quite different levels of performance 
across the three assessments. For NAEP and TIMSS, this is also true within each of the five 
corresponding content areas related to number , measurement, geometry, data, and algebra. 

At the overall level, there is apparent agreement between the NAEP and TIMSS frameworks 
on the general boundaries and basic organization of mathematics content across the fourth and eighth 
grades, with nearly all items from each assessment being placed in one of the major content areas of 
the other assessment framework at the broadest level. Furthermore, both NAEP and TIMSS place 
similar emphases on each of the five major content areas, as evidenced by similar distributions of 
items across the main content areas of both frameworks at both the fourth- and eighth-grade levels. 
These types of comparisons, however, do not consider the grade level correspondence or the level of 
content match based on the distribution of items across the specific set of topics and subtopics 
included at each grade level in each of the assessments. 

Despite the similarity between NAEP and TIMSS at the broadest content area level, there are 
differences between the two assessments when considering more detailed comparisons of the 
mathematics content covered and the grade level correspondence between the items in each 
assessment and the intentions of the other assessment framework. Differences between the NAEP 
and TIMSS assessments emerge with more detailed content analyses that consider the level of 
content match to specific topics and subtopics in the other assessment framework, with 20 percent of 
fourth-grade and about 1 5 percent of eighth-grade items from both assessments not classified to 
specific subtopics in the other assessment framework at any grade level. This finding indicates that 
both assessments contain items that might not be included in the other assessment and supports the 
general claim that NAEP and TIMSS do not necessarily assess the same mathematics content. 

Most NAEP and TIMSS items were placed at the same grade on the other assessment 
framework, but this was not always within the corresponding content area. The overall grade-level 
correspondence between the NAEP items and the TIMSS framework, 86 percent at fourth grade and 
73 percent at eighth grade, was lower than that between the TIMSS items and the NAEP framework 
(at least 90 percent). This is related at least in part to the inclusion of cross-grade items in NAEP that 
were administered at multiple grade levels. There are notable differences across content areas in the 
level of grade match between the two assessments. In the TIMSS assessment, measurement and 
geometry account for most of the items classified at different grade levels to the NAEP framework 
(10 percent or more). In the NAEP assessment, the content area of data analysis, statistics and 
probability has the largest percentage of fourth-grade items classified at a higher grade level (almost 



4 Additional released item sets from each assessment are also available on the NAEP, TIMSS, and PISA websites: 
http://nces.ed.gov/nationsreportcard , http://isc.bc.edu/timss2003.html, and http://www.pisa.oecd.org. 



IV 



half), most notably from items covering probability topics which are not assessed in TIMSS until the 
eighth grade. In the NAEP eighth-grade assessment, between 10 and 43 percent of items in all 
content areas were classified at the fourth-grade level in TIMSS, with the largest proportion of items 
in measurement and geometry and spatial sense (37 and 43 percent, respectively) and the smallest in 
data analysis, statistics, and probability (10 percent). 

Within each content area, detailed comparisons of content coverage and grade 
correspondence indicate that NAEP and TIMSS items are not necessarily measuring the same content. 
Some of the differences in topic and subtopic emphasis or grade level match in each content area 
include the following: 

• Number. At the eighth grade, TIMSS has a relatively larger emphasis on ratio, proportion, 
and percent compared to whole numbers in NAEP. There is a somewhat greater emphasis on 
computation in TIMSS at both fourth and eighth grades. Only the NAEP eighth-grade 
assessment includes scientific notation (data not shown). 

• Measurement : A larger proportion of NAEP fourth-grade items involve the selection and use 
of appropriate measurement instruments and units. While TIMSS has a greater emphasis at 
both grades on problems involving properties (area, perimeter, volume, surface area) of two- 
and three-dimensional shapes, a number of fourth-grade TIMSS items were classified to the 
NAEP eighth-grade framework (16 percent). In each assessment, at least 25 percent of 
eighth-grade items was classified at the lower grade level of the other assessment. In 
addition, there is an overlap of NAEP measurement items with topics in the TIMSS geometry 
framework. 

• Geometry. A larger proportion of NAEP items involve two- and three-dimensional shapes, 
while TIMSS has a greater emphasis on congruence and similarity. There are differences in 
the nature of problem-solving items (TIMSS with more application of geometric properties 
and NAEP with more use of geometric models). Forty-three percent of NAEP eighth-grade 
items were classified to the TIMSS fourth-grade framework, while 13 percent of TIMSS 
eighth-grade items were classified to the NAEP twelfth-grade framework. 

• Data : NAEP includes probability items in the fourth-grade assessment, while TIMSS does 
not include this topic until the eighth grade. In TIMSS, there is a greater emphasis on 
reading and interpreting data in tables and graphs at the fourth grade. In NAEP, there is a 
higher proportion of eighth-grade items involving the organization and display of data. 

• Algebra : TIMSS has a greater emphasis on algebraic expressions and operations at eighth 
grade. Some of the eighth-grade NAEP items involving patterns, equations, and functions 
were classified to the fourth-grade TIMSS framework (18 percent). There is an overlap of 
NAEP algebra and functions topics involving the use of number lines and coordinate systems 
with the TIMSS geometry’ framework, and some TIMSS eighth-grade items were classified to 
the NAEP twelfth-grade framework (5 percent). 

NAEP and TIMSS appear to be quite similar overall in terms of the distribution of items 
across the low, moderate, and high mathematical complexity levels. Sixty-four percent of fourth- 
grade items and more than half of eighth-grade items were classified at the low complexity level and 
less than 5 percent were classified at the high complexity level at both grade levels in NAEP and 



v 




TIMSS. The content areas with the highest proportion of items (more than 60 percent) classified at 
the moderate or high complexity level are algebra and functions in fourth-grade NAEP, data analysis, 
statistics, and probability in eighth-grade NAEP, and measurement in eighth-grade TIMSS. 

PISA stands apart from NAEP and TIMSS in a number of important areas, including the 
organization of its mathematics content framework (which is based on overarching ideas), its focus 
on problem solving in real-world applications, and the fact that it samples students based on age (15- 
year-olds) rather than grade level. Interestingly, PISA items, which are distinct from NAEP and 
TIMSS items in numerous ways, do have a relatively high degree of content match to NAEP 
subtopics from a purely mathematics content perspective (more than 90 percent classified to a NAEP 
subtopic). Grade-level analyses based on classifications to the NAEP content framework also 
indicate that although the target population of PISA is somewhat older than the students taking the 
NAEP and TIMSS eighth-grade assessments, the mathematics content of most of the PISA items (85 
percent) are at the eighth grade level. 

The different nature of PISA makes it complementary to both NAEP and TIMSS. The 
mathematics topics addressed may not necessarily be substantially different, although PISA places 
greater emphasis on data analysis and less on algebra than do either NAEP or TIMSS, but it is in how 
that content is presented that makes PISA different. In terms of item type and level of mathematical 
complexity, PISA is quite different from NAEP and TIMSS. Not only does PISA use multiple-choice 
items to a far lesser degree, but it also contains a substantially higher proportion of items (7 1 percent) 
classified at the two upper levels of mathematical complexity (moderate and high). 

Differences in the demands that the problem-solving items place on students’ mathematical 
thinking skills are also found when comparing PISA items and NAEP eighth- and twelfth-grade 
problem solving items with respect to the PISA competency clusters , 5 From the perspective of the 
PISA framework, the mathematical thinking skills required of the NAEP problem solving items are 
focused more on reproduction and much less on reflection than PISA. This is consistent with their 
different purposes — NAEP being more closely aligned with curriculum-based mathematics outcomes 
at fourth, eighth and twelfth grades and PISA assessing the preparedness of 15-year-olds to be able to 
apply mathematics to solve novel, real-world problems. The situations or contexts 6 involved in the 
NAEP problem solving items also differed from PISA, with NAEP having a relatively higher 
proportion of items focused on educational/occupational and scientific contexts and lower 
proportions involving personal and public contexts than PISA. A number of the NAEP problem 
solving items investigated were judged by the panel as not appropriate for the PISA assessment (due 
to contexts or mathematical applications that were not authentic) or requiring revisions related to the 
level of instructions, general formatting, and sequencing in order to be included in the PISA 
assessment. 

This report illustrates the complementary nature of the assessments, as there are certainly 
cases, especially looking within content areas, where results from NAEP, TIMSS, or PISA might be 
more informative than the others regarding a specific topic or skill. However, as scores are not 
reported at the topic or subtopic level, the ability to use assessment results to make statements about 
these student skills or abilities is limited to performance on individual items. 



5 The PISA framework defines three competency’ clusters — reproduction, connections, and reflection — to describe 
the mathematical cognitive processes required by its mathematics items. These are described in section 2.3. 

6 The PISA framework includes a situations or contexts dimension with four categories — personal, 
educational/occupational, public, and scientific. 



VI 




For all three assessments, when reviewing results, it is important to look beyond the overall 
scores and content area subscales and examine in detail what each assessment measures. This study 
has yielded data that can be used to make informed readings of results. While there is no single 
factor that may be related to differences in student performance, the numerous differences noted here, 
whether dramatic or more minor, may have a substantial effect overall. As each assessment program 
continues, this type of research can continue, not only to help explain differences in student scores, 
but also to understand the complementary nature of the three assessments. 

This report provides a first-level comparison of items in each assessment in terms of the 
coverage of broad content areas and distribution across mathematics topics as defined in the 
frameworks. All items in each assessment were considered in order to make overall comparisons of 
content coverage and grade-level expectations as well as distributions with respect to three broad 
levels of mathematical complexity. In addition, the types of item classifications conducted within the 
time constraints of this study permit comparisons at the mathematics topic level for each content area. 
While this method provides a broad view of some of the similarities and differences between the 
assessments, it is limited in terms of the types of comparisons that are provided at the item level. 

More in depth analyses of the exact nature of the items from each assessment within topics would 
reveal other important differences related to difficulty, scope, depth, complexity, and other item 
attributes. These types of more focused comparisons were outside the scope of this study, but may be 
important to include in future comparative studies of the assessments. 




Acknowledgments 



Many people’s contributions made this report possible, and the authors wish to thank all 
those who have assisted with various aspects of the report, including data analysis, reviews, and 
design. 



Members of the expert panel listed in appendix C provided all of the item classification data 
used as the foundation for the results presented in this report. The authors would like to thank the 
panel members for their expertise and contributions to the study. 

Thanks to Patrick Gonzales, Eugene Owen, and Mariann Lemke of the National Center for 
Education Statistics (NCES) for their input on the design and their role in reviewing the report. The 
authors are grateful to NCES technical reviewers, Marilyn Seastrom and Shelley Bums, as well as to 
NCES program directors (Andrew Malizio and Elois Scott) and associate commissioners (Peggy Carr 
and Val Plisko) who provided direction and support for this publication. The authors wish to thank 
Lisa Bridges of the Institute of Education Sciences, Christine Brusselmans-Dehairs of Ghent 
University, Jeff Haberstroh of the Educational Testing Service, Marcie Dingle of Synergy 
Enterprises, and two anonymous reviewers for their very helpful comments. 

Several people, formerly or currently from the Education Statistics Services Institute (ESSI), 
provided support in the form of research assistance and/or review: Ben Dalton, Kristy David, Aaron 
Douglas, J. Lane Glenn, Jamie Johnston, Juan Carlos Guzman, Dana Kelly, Janine Emerson, David 
Miller, Melanie Ouellette, Lisette Partelow, Anindita Sen, and Margaret Woodworth. The authors 
wish to thank ESSI technical reviewers, Lauren Gilbertson and Zeyu Xu, for their hard work and 
expertise. Thanks also to David Miller in helping to shepherd the report through the final stages of 
the review and publication process, to Heather Block, also of ESSI, for her work on graphic design 
for this report, and to Brian Henigin of Westat for the final publication formatting of the report. 

For permission to use secure items and publish released items, the authors would like to 
thank the Trends in International Mathematics and Science Study (TIMSS) and the International 
Association for the Evaluation of Educational Achievement (IEA), as well as the Program for 
International Student Assessment (PISA) and the Organization for Economic Cooperation and 
Development (OECD). Thanks also to the staff of the Educational Testing Service for participation 
in the expert panel meeting and for supplying item information and review materials from the 
National Assessment of Educational Progress (NAEP). 




Contents 



Page 



Executive Summary iii 

Acknowledgments viii 

List of Tables xi 

List of Figures xiii 

List of Exhibits xiv 

1. Introduction 1 

2. Overview of the Assessments and Their Frameworks 5 

2.1. NAEP 2003 Mathematics Framework 9 

2.2. TIMSS 2003 Mathematics Framework 11 

2.3. PISA 2003 Mathematical Literacy Framework 13 

2.4. Comparing the NAEP, TIMSS, and PISA Mathematics Frameworks and 

Assessments 15 

3. Process and Methods 19 

3.1. Organization of the Expert Panel Meeting 19 

3.2. Methods Used for NAEP/TIMSS Comparisons 20 

3.3. Methods Used for NAEP/PISA Comparisons 22 

4. Overall Comparisons 23 

4.1. Content Coverage 23 

4.2. Grade Level 26 

4.3. Levels of Mathematical Complexity 30 

4.4. Item Format 38 

5. NAEP/TIMSS Comparisons by Main Content Areas 43 

5.1. Number 44 



IX 




5.2. Measurement 



51 



5.3. Geometry 57 

5.4. Data 64 

5.5. Algebra 70 

6. NAEP/PISA Comparisons 77 

6.1. Content Comparisons Based onNAEP and PISA Frameworks 77 

6.2. PISA Competency Clusters 79 

6.3. PISA Situations or Contexts 81 

6.4. Comparing General Characteristics ofNAEP and PISA Items 82 

7. Conclusion 85 

References 87 

Appendix A: Content Framework Summary Documents A- 1 

Appendix B: Levels of Mathematical Complexity B-l 

Appendix C: Expert Panel C-l 

Appendix D: Methodological Notes and Supplementary Data D-l 

Appendix E: Example Items E-l 



x 




List of Tables 

Table Page 

1. Target percentage of NAEP items distributed across NAEP framework dimensions, by 

grade: 2003 10 

2. Target percentage of TIMSS assessment time distributed across TIMSS framework 

dimensions, by grade: 2003 12 

3. Target percentage of PISA assessment distributed across PISA framework dimensions: 

2003 14 

4. Percentage of NAEP, TIMSS, and PISA mathematics items classified to the content 

strands in the NAEP 2003 mathematics framework, by grade/age and survey: 2003 24 

5. Percentage of NAEP and TIMSS mathematics items classified to the content domains 

in the TIMSS 2003 mathematics framework, by grade and survey: 2003 24 

6. Percentage of NAEP, TIMSS, and PISA mathematics items classified to other 

assessment framework at the topic or subtopic level, by grade/age and survey: 2003 26 

7. Percentage of NAEP single-grade and cross-grade mathematics items classified at 

each grade level according to the TIMSS mathematics framework: 2003 30 

8. Percentage distribution of NAEP, TIMSS, and PISA mathematics items across levels of 

mathematical complexity, by grade/age and survey: 2003 31 

9. Percentage distribution of NAEP, TIMSS, and PISA mathematics items across item 

formats, by grade/age and survey: 2003 39 

10. Percentage of NAEP and TIMSS fourth- and eighth-grade number items classified to 

the other mathematics assessment framework, by level of content/grade match: 2003 47 

11. Percentage of NAEP and TIMSS fourth- and eighth-grade measurement items 
classified to the other mathematics assessment framework, by level of content/grade 

match: 2003 53 

12. Percentage of NAEP and TIMSS fourth- and eighth-grade geometry items classified 
to the other mathematics assessment framework, by level of content/grade match: 

2003 59 

13. Percentage ofNAEP and TIMSS fourth- and eighth-grade data items classified to the 

other mathematics assessment framework, by level of content/grade match: 2003 66 

14. Percentage ofNAEP and TIMSS fourth- and eighth-grade algebra items classified 
to the other mathematics assessment framework, by level of content/grade match: 

2003 72 

1 5 . Percentage distribution of PISA mathematics items across NAEP mathematics content 

strands, by PISA overarching idea: 2003 78 

16. Percentage distribution ofNAEP 2003 eighth-grade and NAEP 2000 twelfth-grade 

mathematics problem solving items across PISA overarching ideas, by NAEP content 
strand: 2000 and 2003 79 



xi 




17. Percentage distribution of PISA 2003 mathematics items and NAEP 2003 eighth-grade 

and NAEP 2000 twelfth-grade mathematics problem solving items classified to PISA 
competency clusters, by grade/age: 2000 and 2003 80 

18. Percentage distribution of NAEP 2003 eighth-grade and NAEP 2000 twelfth-grade 

mathematics problem solving items across PISA competency clusters, by NAEP content 
strand: 2000 and 2003 81 

19. Percentage distribution of PISA 2003 mathematics items and NAEP 2003 eighth-grade 
and NAEP 2000 twelfth-grade mathematics problem solving items across PISA 

situations or contexts categories: 2000 and 2003 82 

D-l. Reliability of mathematical complexity level classifications for mathematics items in 

NAEP 2003 and TIMSS 2003, by number of comparisons and percentage agreement D-3 

D-2. Reliability of mathematical complexity level classifications for mathematics items in 

NAEP 2003 and TIMSS 2003, by number of items and percentage agreement D-3 

D-3. Distribution of PISA 2003 mathematics items across topics within the NAEP 2003 

mathematics framework content strands, by PISA overarching idea category D-5 



xii 




List of Figures 

Figure Page 

1-A. Percentage distribution of NAEP mathematics items classified at each grade level 

according to the TIMSS mathematics framework, by grade: 2003 28 

1-B. Percentage distribution of TIMSS and PISA mathematics items classified at each grade 

level according to the NAEP mathematics framework, by grade/age: 2003 29 

2. Percentage of NAEP and TIMSS fourth-grade items classified as moderate or high 

mathematical complexity level, by mathematics content area: 2003 32 

3. Percentage of NAEP and TIMSS eighth-grade items classified as moderate or high 

mathematical complexity level, by mathematics content area: 2003 33 

4. Percentage distribution of fourth- and eighth- grade NAEP mathematics items across 

mathematical complexity levels, by NAEP mathematical ability category: 2003 35 

5. Percentage distribution of fourth- and eighth-grade TIMSS mathematics items across 

mathematical complexity levels, by TIMSS cognitive domain category: 2003 36 

6. Percentage distribution of PISA mathematics items across mathematical complexity 

levels, by PISA competency cluster: 2003 37 

7. Percentage distribution of NAEP, TIMSS, and PISA mathematics items across levels of 

mathematical complexity, by item format: 2003 40 

8. Percentage of NAEP and TIMSS number items classified to number sense, properties, 
and operations topics in the NAEP mathematics framework, by survey and grade: 

2003 49 

9. Percentage of NAEP and TIMSS number items classified to number topics in the 

TIMSS mathematics framework, by survey and grade: 2003 50 

10. Percentage of NAEP and TIMSS measurement items classified to measurement topics 

in the NAEP mathematics framework, by survey and grade: 2003 55 

11. Percentage of NAEP and TIMSS measurement items classified to measurement topics 

in the TIMSS mathematics framework, by survey and grade: 2003 56 

12. Percentage of NAEP and TIMSS geometry items classified to geometry and spatial sense 

topics in the NAEP mathematics framework, by survey and grade: 2003 62 

13. Percentage of NAEP and TIMSS geometry items classified to geometry topics in the 

TIMSS mathematics framework, by survey and grade: 2003 63 

14. Percentage of NAEP and TIMSS data items classified to data analysis, statistics, and 

probability topics in the NAEP mathematics framework, by survey and grade: 2003 68 

15. Percentage of NAEP and TIMSS data items classified to data topics in the TIMSS 

mathematics framework, by survey and grade: 2003 69 

16. Percentage of NAEP and TIMSS algebra items classified to algebra and functions 

topics in the NAEP mathematics framework, by survey and grade: 2003 74 

17. Percentage of NAEP and TIMSS algebra items classified to algebra topics in the TIMSS 

mathematics framework, by survey and grade: 2003 75 

xiii 




List of Exhibits 

Exhibit Page 

1-A. NAEP mathematics framework dimensions: 2003 8 

1-B. TIMSS mathematics framework dimensions: 2003 8 

1-C. PISA mathematical literacy framework dimensions: 2003 8 

2. Terminology used in making comparisons across the NAEP 2003 and TIMSS 2003 

content frameworks 16 

3. Number topics included in the NAEP and TIMSS mathematics frameworks: 2003 45 

4. Measurement topics included in the NAEP and TIMSS mathematics frameworks: 2003 52 

5. Geometry topics included in the NAEP and TIMSS mathematics frameworks: 2003 58 

6. Data topics included in the NAEP and TIMSS mathematics frameworks: 2003 65 

7. Algebra topics included in the NAEP and TIMSS mathematics frameworks: 2003 71 

A-l. NAEP mathematics framework and specifications summary: 2003 A-2 

A-2. TIMSS mathematics framework and specifications summary: 2003 A-10 

B-l . Levels of mathematical complexity adapted from the NAEP 2005 mathematics 

framework B-l 

E-l. Index of example items from NAEP 2003, TIMSS 2003, and PISA 2003 E-l 



xiv 




1. Introduction 



Researchers, policymakers, educators, and members of the general public interested in the 
achievement ofU.S. students currently have available several major sources of national-level data: 
results from the U.S. Department of Education’s National Assessment of Educational Progress 
(NAEP) and U.S. results from various international assessments, such as the Progress in International 
Reading Literacy Study (PIRLS), the Program for International Student Assessment (PISA), and the 
Trends in International Mathematics and Science Study (TIMSS). NAEP administers periodic 
assessments in reading, mathematics, science, and other subjects at the fourth, eighth, and twelfth 
grades; TIMSS assesses mathematics and science at fourth and eighth grade; and PIRLS is a reading 
literacy assessment administered to fourth-grade students. In comparison, PISA primarily assesses 
the literacy 1 of 15-year-old students in reading, mathematics and science. In cases where the 
different assessments address the same subject areas (e.g., mathematics, reading, science) at the same 
or similar grade levels, the opportunity exists to measure U.S. student achievement using multiple 
instruments. Comparing results across assessments can be useful not only for interpreting the results, 
but also for developing a more complete picture of student achievement than would be possible with 
the results of just one assessment. 

In order to provide useful guidance for comparing the results of the different assessments, the 
U.S. Department of Education’s National Center for Education Statistics (NCES) has periodically 
conducted studies comparing various assessments in terms of their underlying frameworks, items, 
and other related features. In 2003, NCES conducted two comparison studies — one in mathematics 
and one in science — following the 2003 administrations of TIMSS and PISA. This report focuses on 
a comparison of the mathematics assessments — NAEP 2003, TIMSS 2003, and PISA 2003 — while a 
companion report (Neidorf, Binkley, and Stephens 2006) compares the NAEP 2000 and TIMSS 2003 
science assessments. 

The 2003 mathematics and science comparison studies build on several earlier studies, which 
were also undertaken to explore the similarities and differences between NAEP and various 
international assessments. Such studies comparing frameworks and items are conducted periodically, 
as NAEP and international assessments evolve, improving their frameworks and test items to reflect 
current research, policy, and practice. 

Previous published studies of mathematics and science assessments included comparisons of 
the TIMSS 1995 and NAEP 1996 mathematics assessments (McLaughlin, Dossey, and Stancavage 
1997) and the NAEP 2000, TIMSS 1999, and PISA 2000 mathematics and science assessments 
(Nohara 2001). Both studies compared the underlying frameworks and test items from each 
assessment in terms of content, item format, and thinking skills required. 

There also have been several studies comparing reading assessments. The earliest of these 
compared the NAEP 1992 reading assessment and the 1991 IEA Reading Literacy Study (Binkley 
and Rust 1994). More recently, Binkley and Kelly (2003) examined the frameworks, reading 
passages and items from the NAEP 2002 and PIRLS 2001 reading assessments. 



1 PISA uses the terminology of “literacy” in each subject area to denote its broad focus on application of knowledge 
and skills; that is, PISA seeks to ask if the 15-year-olds are mathematically literate, or to what extent they can apply 
mathematical knowledge and skills to a range of different situations they may encounter in their lives. 



1 




The goal of this mathematics comparison study is to identify similarities and differences 
between the 2003 NAEP, TIMSS, and PISA assessments based on a detailed comparison of their 
frameworks and items. This information may be used to help inform interpretations of student 
performance in mathematics on the three different assessments. While there are other important 
aspects that might be compared, such as item difficulty, sampling, and scaling procedures, this study 
focuses on a comparison of the content of the assessments. This content comparison is based on the 
main dimensions of the assessment frameworks and focuses on a comparison of the set of assessment 
items as a reflection of how the frameworks are implemented. The main questions driving the study 
are as follows: 

• How do NAEP, TIMSS, and PISA define the domain of mathematics to be assessed and its 
main content areas, in terms of both the topics that are included and the distribution of items 
across topics? 

• How do NAEP, TIMSS, and PISA define the content and process skills appropriate for the 
assessments at different grade or age levels? How do the items in each assessment compare 
to the grade-level expectations specified by the other frameworks? 2 

• How do the items in the NAEP, TIMSS, and PISA assessments compare with respect to the 
level of mathematical complexity demanded of students? 

• How do NAEP, TIMSS, and PISA compare with respect to the types and distribution of item 
formats used? How do the items in the different assessments compare in terms of their 
problem-solving contexts? 

To answer these questions, NCES convened an expert panel (appendix C) to examine the 
mathematics frameworks and items for each assessment. The panel cross-classified NAEP and 
TIMSS fourth- and eighth-grade items to each other’s assessment frameworks with respect to 
mathematics content and grade level. PISA items also were classified to the NAEP framework on 
the same dimensions. The panel classified the items from all three assessments with respect to a 
common definition of mathematical complexity level based on the NAEP 2005 framework. 3 A 
limited comparison was also made between PISA items and NAEP eighth- and twelfth-grade 
problem solving items. Although TIMSS and PISA were not compared directly, this approach 
permits the comparison of NAEP, TIMSS, and PISA through the common classification systems 
based on the NAEP framework. In addition to the classification data from the panel, the study draws 
upon information provided by the NAEP, TIMSS, and PISA assessment developers that describes 
how each item is classified according to the main dimensions of its own framework, as well as other 
relevant characteristics such as item format and scoring rubrics. 



2 The 2003 mathematics and science comparison studies are the first to compare the assessments in terms of grade 
level — the extent to which items from one assessment map to the same grade level of the framework of the other 
assessment. 

3 The rationale for using this dimension from the NAEP 2005 framework as the basis for item classifications is 
described in appendix D. 



2 




Section 2 of this report presents an overview of the NAEP, TIMSS, and PISA assessments 
and a comparison of their respective mathematics assessment frameworks. Section 3 reviews the 
methods used for this comparison study. The results of the study are then presented in three major 
sections. The first, section 4, compares the NAEP, TIMSS, and PISA assessments overall with 
respect to content coverage, grade level, mathematical complexity level, and item format. The 
overall comparisons are followed by comparisons of the NAEP and TIMSS assessments with respect 
to each of the following main content areas (section 5): number, measurement, geometry, data, and 
algebra. This section provides more detailed comparisons of the extent to which items in one 
assessment map to the mathematics framework of the other assessment. It compares the content 
distribution of the items for each of the NAEP and TIMSS mathematics subscales. Section 6 
contains additional comparisons made between the NAEP and PISA assessments, including detail on 
the mathematics topics covered by the PISA items and how NAEP eighth- and twelfth-grade problem 
solving items compare to those included in the PISA assessment. The report concludes with a 
summary of key findings (section 7). 



3 




2. Overview of the Assessments and Their Frameworks 



NAEP 



The National Assessment for Educational Progress (NAEP) is the United States’ source for 
nationally representative and continuing information on what American students know and can do 
and is well known as the Nation’s Report Card. NAEP policies and frameworks are established by 
an independent National Assessment Governing Board (NAGB), and the Department of Education’s 
National Center for Education Statistics (NCES) administers the assessments. For over 30 years, 
NAEP has periodically collected and reported data on achievement in reading, mathematics, science 
and other subjects, for students in fourth, eighth, and twelfth grades. The comparisons in this report 
are based on the main NAEP assessments conducted in 2003 at the fourth- and eighth-grade levels 
and in 2000 at the twelfth-grade level. 1 

The frameworks established by NAGB for all the NAEP subject areas, including 
mathematics, are based on the collaborative input of a wide range of experts and involvement by 
participants from government, education, business, and public sectors. They are informed by 
common curricular practices in the nation’s schools and ultimately are intended to reflect the best 
thinking about the knowledge, skills, and competencies needed for students to have a deep level of 
understanding at different grades and in different subject areas. 

TIMSS 



The Trends in International Mathematics and Science Study (TIMSS) is the United States’ 
source for international comparative information on mathematics and science education in the 
elementary and middle grades. TIMSS is one of the current studies conducted under the auspices of 
the International Association for the Evaluation of Educational Achievement (IEA), which has been 
conducting international comparative studies since the early 1960s, and is directed by the 
International Study Center at Boston College. TIMSS collects achievement and background data to 
provide information on trends in mathematics and science achievement over time as well as on the 
curricular, instructional, and attitudinal factors that may be related to performance. TIMSS collects 
data on a 4-year cycle, with the first administration in 1995 (at fourth, eighth, and twelfth grades), 2 
the second in 1999 (at eighth grade only), and the most recent in 2003 (at fourth and eighth grades), 
with about 50 countries participating. 

Like NAEP, the TIMSS assessments are based on collaboratively developed frameworks. In 
contrast to NAEP, however, the framework development and consensus process involved 
mathematics experts, education professionals, and measurement specialists from many countries. 



1 At the time this study was conducted, NAEP 2000 was the most recent mathematics assessment at grade 12, and 
NAEP 2003 (which did not include grade 12) was the most recent mathematics assessment at grades 4 and 8. NAEP 
long-term trend assessments in mathematics were also administered in 2003-04 but were not included in this study. 
Later, in 2005, NAEP conducted a mathematics assessment at fourth, eighth, and twelfth grades. 

2 Defined as the upper of the two grades containing the majority of 9-year-olds or 13-year-olds and the final year in 
secondary school. These are the fourth, eighth and twelfth grades in the U.S. and most other countries. TIMSS 1995 
was also administered in third and seventh grades. 



5 




PISA 



The Program for International Student Assessment (PISA) is conducted by the Organization 
for Economic Cooperation and Development (OECD). The main objective of PISA is to provide 
regular, policy-relevant data on the “yield” of education systems, and so targets students at an age 
that is near the end of compulsory schooling in most countries (15-year-olds). PISA focuses on 
literacy — the ability to use and apply knowledge and skills to real-world situations encountered in 
adult life — in the key subject areas of reading, mathematics, and science. PISA is, thus, the United 
States’ source of comparative information on the reading, mathematical, and scientific literacy skills 
of students in the upper grades, and it provides benchmarks to international performance levels based 
on other OECD countries. The frameworks guiding the PISA assessments reflect a consensus across 
the OECD countries regarding the skills and abilities that demonstrate literacy in these key areas. 

A key design feature of PISA is its cycle of rotating emphasis among the three key 
assessment areas every three years. Each subject area is assessed in each data collection, but the 
design distinguishes between major and minor domains. When a subject is the major domain it 
comprises a relatively greater share of the total assessment time, with a larger number of items and 
an assessment framework that is more fully developed and updated. Reading literacy was the major 
domain in the first PISA assessment in 2000 (32 countries), mathematical literacy was the major 
domain in the most recent 2003 assessment (41 countries), and scientific literacy will be the major 
domain in the next assessment in 2006. 3 

Organization of the NAEP, TIMSS, and PISA 2003 Mathematics Frameworks 

Assessment frameworks define what will be assessed, including the content to be covered, 
the types of test questions, and recommendations for how the test is administered. Exhibits 1-A, 1-B, 
and 1-C compare schematically the organizing dimensions in the NAEP, TIMSS, and PISA 2003 
mathematics frameworks. These organizing dimensions provide the basic framework for the 
development of the pool of items in each assessment, and the frameworks include target percentages 
for the distribution of the assessments across the main categories in each dimension to ensure a 
balanced assessment (discussed in the following sections). 4 As seen in these exhibits, there are some 
basic organizational differences between the frameworks, especially between PISA and NAEP or 
TIMSS. 



Both the NAEP and TIMSS 2003 mathematics frameworks represented in exhibits 1-A and 
1 -B are based on two main organizing dimensions — a content dimension and a cognitive 
dimension — as well as an overarching dimension (along the bottom) that defines processes that go 
across the content and cognitive categories. Both NAEP and TIMSS include five similarly labeled 
categories in the content dimension ( content strands in NAEP and content domains in TIMSS) that 
correspond to major mathematics curricular areas related to number, measurement, geometry, data, 
and algebra. In the main cognitive dimensions ( mathematical abilities in NAEP and cognitive 
domains in TIMSS), NAEP has three broad categories (< conceptual understanding, procedural 
knowledge, and problem solving ), while TIMSS has four (knowing facts and procedures, using 
concepts, solving routine problems, and reasoning ). There is overlap between the categories defined 
in the cognitive dimensions in NAEP and TIMSS as well as the processes defined by the overarching 
dimensions in each assessment ( mathematical power in NAEP and communicating mathematically in 



3 The 2003 PISA assessment also included an additional component assessing cross-disciplinary problem solving. 
Items from this separate component were not included in this comparison study. 

4 The frameworks only provide target percentages of items or assessment time as guidelines for test development. 



6 




TIMSS). All items developed for NAEP and TIMSS are classified with respect to which categories 
in the two main dimensions they assess. The overarching dimensions are also considered as items 
are developed. 

In contrast to NAEP and TIMSS, the PISA mathematical literacy assessment framework 
includes three main dimensions as shown in exhibit 1-C. Like NAEP and TIMSS, there is one 
dimension related to mathematics content ( overarching ideas)', the four overarching ideas in PISA, 
however, do not directly correspond to the main content categories in NAEP and TIMSS. Also like 
NAEP and TIMSS, PISA includes a cognitive dimension ( competency clusters). In addition to these 
two dimensions, the PISA framework includes a third main dimension related to the situations or 
contexts in which the application of mathematics concepts is required. The situations or contexts 
dimension does not have an analogue in the NAEP and TIMSS frameworks. All items developed for 
PISA are classified with respect to the main categories in each of its three dimensions. 

The following sections describe and compare in more detail the mathematics assessment 
frameworks for NAEP, TIMSS, and PISA. Additional assessment framework summary documents 
that were used for the comparison study are found in appendixes A and B. 



7 




Exhibit 1-A. NAEP mathematics framework dimensions: 2003 



Content strands 


Mathematical abilities 


Number sense, properties, and operations 


Procedural knowledge 


Measurement 


Conceptual understanding 


Geometry and spatial sense 


Problem solving 


Data analysis, statistics, and probability 




Algebra and functions 




Mathematical power 


(Reasoning, connections, communication) 



NOTE: The NAEP framework is based on two main organizing dimensions — content strands and mathematical abilities — as well as 
an overarching dimension ( mathematical power) that defines processes that go across the content and abilities categories. 

SOURCE: U.S. Department of Education, National Assessment Governing Board, Mathematics Framework for the 2003 National 
Assessment of Educational Progress , 2002. 



Exhibit 1-B. TIMSS mathematics framework dimensions: 2003 



Content domains 


Cognitive domains 


Number 


Knowing facts and procedures 


Measurement 


Using concepts 


Geometry 


Solving routine problems 


Data 


Reasoning 


Algebra 




Communicating mathematically 



NOTE: The TIMSS framework is based on two main organizing dimensions — content domains and cognitive domains — as well as an 
overarching dimension (communicating mathematically) that goes across the content and cognitive categories. 

SOURCE: International Study Center, Lynch School of Education, Boston College, TIMSS Assessment Frameworks and 
Specifications 2003, 2nd ed., 2003. 



Exhibit 1-C. PISA mathematical literacy framework dimensions: 2003 



Overarching ideas 


Competency clusters 


Situations or contexts 


Change and relationship 


Reproduction 


Personal 


Quantity 


Connections 


Educational/occupational 


Space and shape 


Reflection 


Public 


Uncertainty 




Scientific 



NOTE: The PISA framework is based on three main organizing dimensions — overarching ideas (content), competency clusters , and situations or 
contexts. 

SOURCE: Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), The PISA 
2003 Assessment Framework: Mathematics, Reading, Science and Problem Solving Knowledge and Skills , 2003. 



8 













